Performance of Two Statistical Indexing Methods, with and without Compound-word Analysis
ثبت نشده
چکیده
In Germanic languages, compound words are very common and very productive. There are compound words which are bound and lexicalized and loose their semantic content when split (e.g. albatross or jordgubbe). This category will be referred to as opaque compounds. The opposite of the opaque compounds are the productive compounds, whose parts keep their semantic value when separated (Bjarnadóttir 2003). Among these are the compounds that are used, and sometimes invented, for a special context (e.g. indexeringsmetod).
منابع مشابه
Comparing the E ect of Syntactic vs . StatisticalPhrase Indexing Strategies for
In this paper we describe the results of experiments contrasting syntactic phrase indexing with statistical phrase indexing for Dutch texts. Our results showed that we at least need a compound splitting algorithm for good quality retrieval for Dutch texts. If we then add either syntactic or statistical phrases, performance generally improves, but this eeect is never statistically signiicant. If...
متن کاملرویکردی با ناظر در استخراج واژگان کلیدی اسناد فارسی با استفاده از زنجیرههای لغوی
Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. S...
متن کاملUchiyama, Kiyoko, Timothy Baldwin and Shun Ishizaki (2005) Disambiguating Japanese Compound Verbs, Computer Speech and Language, Special Issue on Multiword Expressions, Volume 19, Issue 4, pp. 497-512
The purpose of this study is to disambiguate Japanese compound verbs (JCVs) using two methods: (1) a statistical sense discrimination method based on verbcombinatoric information, which feeds into a first-sense statistical sense disambiguation method, and (2) a manual rule-based sense disambiguation method which draws on argument structure and verb semantics. In evaluation, we found that the ru...
متن کاملDisambiguating Japanese compound verbs
The purpose of this study is to disambiguate Japanese compound verbs (JCV) based on two methods: (1) a statistical method which makes use of collocational or semantic information about different verb combinations, and (2) a manual rule-based method which utilises verbal and nominal semantic features. We also present a combined method where the output of the statistical method is fed into the ru...
متن کاملCompound Noun Segmentation Based on Lexical Data Extracted from Corpus
Compound noun analysis is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without white space in real texts, which makes it difficult to identify the morphological constituents. This paper presents an effective method of Korean compound noun segmen-tation based on lexical data extracted from corpus. The segmentation is done by two steps: ...
متن کامل